Mon. Not. R. Astron. Soc. 000, ??-?? (201 1) Printed 22 December 201 1 (MN KTgX style file v2.2) 



Powellsnakes II: a fast Bayesian approach to discrete object 
detection in multi-frequency astronomical data sets 



^ : Pedro Carvalho, 1 * Graca Rocha, 2,3 f M.P. Hobson, 1 ^ A. Lasenby 1,4 ^ 

a 1 Astrophysics Group, Cavendish Laboratory, J.J. Thomson Avenue, Cambridge CB3 ORE, U.K. 
2 California Institute of Technology, Caltech, Pasadena, California, U.S.A. 
q ■ 3 Jet Propulsion Laboratory, JPL, 4800 Oak Grove Drive, Pasadena, California 91109 
' 4 Kavli Institute for Cosmology Cambridge, Madingley Road, Cambridge, CB3 OHA, U.K. 



o 

(N 

o 

u 

43 
6 



> 
oo 

00 



Accepted — . Received — ; in original form 22 December 201 1 



ABSTRACT 

Powellsnakes (PwS; Carvalho et al. 2009) is a Bayesian algorithm for detecting compact 
objects embedded in a diffuse background, and was selected and successfully employed by 
the Planck consortium (Planck Collaboration 2011, a) in the production of its first public 
deliverable: the Early Release Compact Source Catalogue (Planck Collaboration 2011, b) 
(ERCSC). We present the critical foundations and main directions of further development of 
PwS, which extend it in terms of formal correctness and the optimal use of all the available 
information in a consistent unified framework, where no distinction is made between point 
sources (unresolved objects), SZ clusters, single or multi-channel detection. An emphasis is 
placed on the necessity of a multi-frequency, multi-model detection algorithm in order to 
achieve optimality. 
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1 INTRODUCTION 

The detection and characterisation of discrete objects is a common 
problem in many areas of astrophysics and cosmology. Indeed, ev- 
ery data reduction process must resort to some form of compact ob- 
ject detection, since either the objects themselves are the goal of the 
study or they act as contaminants and therefore must be removed. 
In such analyses, the key step usually involves the separation of 
a localised object signal from a diffuse background, defined as all 
contributions to the image aside from the objects of interest. 

A well-established method to address this issue is to assume 
that most of the pixels are part of the background exclusively 1 , the 
background is smoothly varying, i.e. has a characteristic length- 
scale much larger than that of the objects of interest and the object 
are bright compared with the background. A successful example of 
an object detection algorithm based on these assumptions is SEx- 
tractor (Bertin & Arnouts 1996). Its first step is to estimate the 
image background. The algorithm builds up an intensity histogram 
iteratively and clips it around its median. The resulting mesh (re- 
sembling a 'swiss-cheese') is then bilinearly interpolated to fill in 
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1 This is possible only if the fields are not very densely packed with objects. 



the holes. After this background has been subtracted, the detection 
and characterisation of the objects is performed either by looking 
for sets of connected pixels above a given threshold or by boost- 
ing the image maxima with the help of an 'on-the-fly' convolution 
using a pre-defined kernel or the beam PSF. 

Despite their general acceptance, such methods run into diffi- 
culties when the characteristic extension of the fluctuations of the 
diffuse background match the size and the amplitudes of the ob- 
jects of interest. Moreover, problems also arise when dealing with 
low or very low signal-to-noise-ratio (SNR) data, when the rms 
level of the background is comparable to, or even somewhat larger 
than, the amplitude of the localised objects of interest. A good ex- 
ample of this situation is the detection of the Sunyaev-Zel'dovich 
(Sunyaev & Zeldovich 1972) (SZ) effect in galaxy clusters, which 
have characteristic scales similar to that of the primordial CMB 
emission, and at the same time are very faint and extended. In such 
cases, traditional methods fail to provide a statistically supported 
prediction about the uncertainties on the parameter estimates. 

The standard approach for dealing with such difficulties is to 
employ linear filtering, which is an extremely well-developed field, 
very firmly rooted in the principles of the orthodox school of statis- 
tics and signal processing (Van Trees 2001). These methods usu- 
ally start by applying a linear filter tp(x) to the original image d(x), 
and instead analyse the resulting filtered field. The filter is most of- 
ten constructed by assuming a given (possibly parametrised) spatial 
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template, t(x), for the objects of interest. Depending on the appli- 
cation, this profile may contain parameters (to be estimated) and 
already include the beam spreading effects. The common design 
goals for the filter follow the traditional, orthodox figures of merit: 
unbiasedness and efficiency. The optimal solution under these con- 
straints is well-known to be the matched filter (MF; North 1943). 
One may consider the filtering process as optimally boosting (in 
a linear sense) the signal from discrete objects, while simultane- 
ously suppressing the emission from the background. The filtering 
methodology has yet another major advantage of being extremely 
fast and very simple to implement using 'off-the-shelf routines 
(such as FFTs). The uncertainties in the parameter estimates are 
usually obtained from simulations. In practice, however, implemen- 
tations of the filtering codes must be supported by ancillary steps 
in order to cope with the artifacts introduced as a consequence of 
the statistical description of the detection process being incomplete 
(Lopez-Caniego et al. 2007; Melin et al. 2006). 

A natural evolution of the MF, the matched multi-filter 
(MMF), follows exactly the same underlying principles and extends 
them to multi-channel data sets (Herranz et al. 2002; Lanz et al. 
2010). The simultaneous multi-frequency analysis of a set of im- 
ages has the immediate advantage of exploiting the objects' distinc- 
tive spectral signature, if any. Two further advantages of this tech- 
nique are: (i) it boosts the signal from the objects of interest simply 
by adding more data; and (ii) it improves the elimination of the 
background components by taking advantage of their correlation 
between channels. Once again, the thermal SZ effect embedded in 
primordial CMB emission provides a very good example. Owing 
to the well-defined and unique frequency dependence of the SZ ef- 
fect, it is possible to design a filter that combines multi-frequency 
maps to make possible the extraction of deep catalogues even if the 
SZ component is sub-dominant in all the channels (Planck Collab- 
oration 2011, c). 

Further development of traditional filtering techniques in- 
cludes the 'scale-adaptive filter' (SAF; Sanz et al. 2001; Herranz 
et al. 2002), in which the physical scale of the objects of interest is 
added as an extra degree of freedom and an additional condition for 
optimality is added in the derivation of the filter. Schafer & Bartel- 
mann (2007) generalised the scale-adaptive filter to the spherical 
topologies and added multi-channel support. 

A very popular member of the filter family is the wavelets 
group, in particular the mexican-hat (MexHat) wavelet of order 1. 
Indeed, this MexHat wavelet is the MF or the SAF solution under 
particular assumptions about the statistical properties of the back- 
ground and the objects profile (Sanz et al. 2001). Since such con- 
ditions hold very well in modern cosmological data sets, such as 
those obtained from WMAP (Bennett et al. 2003) or Planck, and 
the simplicity of the function allows easy and robust engineering, 
the MexHat wavelet has been a favourite detection tool of many 
authors (Gonzalez-Nuevo et al. 2006; Lopez-Caniego et al. 2006). 
Nonetheless, obtaining good results with the MexHat filter is ex- 
tremely dependent on the value of the acceptance/rejection thresh- 
old. The only way to ensure optimal performance is to run the code 
on realistic simulations and then assess the code's yield against the 
simulation's input catalogue, but a large number of runs is needed 
to fine-tune the threshold value. Exactly the same procedure must 
be followed to determine the uncertainties on the parameter esti- 
mates. This may not seem a severe limitation, since immense com- 
puting resources are now cheaply available. Given the increased 
level of accuracy and complexity of current cosmological data sets, 
however, simulations must be rather sophisticated to provide a re- 
alistic test bed, and so even the enormous computational resources 



available are not sufficient to cope with the massive throughput de- 
manded. For example, a single realistic Planck simulation (FFP) 
takes about one full week to run on a very large cluster and to 
have reasonable estimates of the parameter uncertainties and detec- 
tion thresholds, at least several hundred independent simulations 
are needed. 



To overcome these limitations of linear filtering methods, 
Hobson & McLachlan (2003) introduced a detection algorithm 
based on a Bayesian approach. As with the filtering techniques, 
the method assumed a parameterised form for the objects of inter- 
est, but the optimal values of these parameters, and their associated 
uncertainties, were obtained in a single step by evaluating their full 
posterior distribution. Another major advantage of this method is 
the consistent inclusion of physical priors on the parameters defin- 
ing the objects and on the number of objects present, which im- 
prove the detection efficiency. Although this approach represented 
a further step in the direction of bringing a more solid statistical 
foundation to the object detection/characterization problem, its im- 
plementation was conducted using a Monte-Carlo Markov chain 
(MCMC) algorithm to sample from a very complex posterior dis- 
tribution with variable dimensionality (dependent on the number of 
objects). This technique therefore proved extremely computational 
intensive. Despite the considerable progress that has recently been 
made towards increasing the efficiency of sampling-based Bayesian 
object detection methods (Feroz & Hobson 2008), such algorithms 
are still substantially slower than simple linear filtering methods. 
In a recent work, Argiieso et al. (201 1) suggested a semi-analytical 
hybrid Bayesian maximum a posteriori (MAP) scheme to overcome 
the complexity and the massive resources required the Hobson & 
McLachlan method. Its main advantage is really simplicity. How- 
ever, the method still relies on the MF to find the source's positions 
and, therefore, it does not embody the full Bayesian logic. Mean- 
while, Carvalho et al. (2009) and Feroz et al. (2009) have moved 
one step further towards the theoretically-optimal Bayesian solu- 
tion by exploring the use of evidence ratio methods, which are the 
optimal decision-making tools (see section 2.2), rather than simply 
adopting the MAP solution. 



Our proposal here is to blend detection strategies, i.e. multi- 
channel filtering, Bayesian posterior sampling and evidence ratio 
evaluation, into a rigorous, hybrid, multi-model scheme (as op- 
posed to traditional binary models). This novel methodology is si- 
multaneously general, formally and statistically firmly grounded, 
and overcomes the computation inefficiencies of the pure sampling 
methodologies. This is PowellSnakes II. 



The structure of this paper is as follows. In Section 2, we 
give an overview of Laplace-Bayes probability theory and its close 
relationship with decision theory as a consistent inference and 
decision-making device. Our data model and the different con- 
stituents of the Bayesian framework, namely the likelihood and pri- 
ors are discussed in Section 3, and in Section 4 we bring together 
these elements and recommend an implementation strategy based 
on the exploration of the properties and symmetries of the posterior 
manifold. We also identify problems that may arise and we suggest 
effective ways of tackling them using the Bayesian formalism. Fi- 
nally we present our conclusions and directions for future work in 
Section 6. 
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2 BAYESIAN INFERENCE 
2.1 Basic tools 

The Bayesian system of inference is the only one that provides a 
consistent extension of deductive logic ( { 0=false, 1= true} ) to a 
broader class of 'degrees-of-belief by mapping them into the real 
interval [0, 1] (Jaynes 2003). Combining the multiplication rule 
together with the associativity and commutativity properties of the 
logical product, one may write the equation which will give us the 
posterior probability of a set of parameters (©) taking into account 
the data (d) and the underlining hypothesis (H). This equation is 
also known as Bayes theorem 



Pr(0|d,H) 



Pr(d| ®,H)Pr(®\H) 
Pi(d\H) 



(1) 



where, for brevity, we denote Pr(©|d, H) = P(&) as the pos- 
terior probability distribution of the parameters, Pr(d| ®,H) = 
£(©) as the Likelihood, Pv(®\H) = 7r(©) as the prior and 
Pr(d\H) = Z as the Bayesian evidence. The (unnormalised) pos- 
terior distribution is the complete inference of the parameter values 
©, and thus plays the central role in Bayesian parameter estima- 
tion. 

The normalised posterior distribution may be easily obtained 
by integrating over all possible values of the parameters and equat- 
ing the resulting expression to one, and from this argument one can 
easily see that the evidence is given by 



(2) 



Z = Pi{d\H)= £(®)Tr(®)d*®, 



where K is the dimensionality of the parameter space. Inspecting 
this expression, one immediately recognizes that the evidence is 
the expectation of the likelihood over the prior, and hence is central 
to Bayesian model selection between different hypothesis Hi. We 
note that the evidence evaluation requires the prior to be properly 
normalised. 



2.2 Decision theory 

Probability theory defines only a state of knowledge: the posterior 
probabilities. There is nothing in probability theory per se that de- 
termines how to make decisions based on these probabilities. In- 
deed, a range of actions is always possible, even when using the 
same state of knowledge, because the cost of making a wrong deci- 
sion usually changes according to the kind of problem under anal- 
ysis. For example, in the case of object detection, one often con- 
siders each type of error, i.e. an undetected object or a spurious 
detection, as equally bad. For a moment, however, suppose we in- 
stead wished to determine whether or not a certain person was im- 
mune to a certain pathogen. Failing to detect a previously acquired 
immunity would only cost the price of an extra vaccine, but fail- 
ing to determine that someone was not immune could seriously put 
her/his life at risk. Thus, even with the same degree of knowledge, 
the cost of choosing incorrectly is not the same in every case. To 
deal with such difficulties, one must apply decision theory (DT), 
which we now briefly summarise. 

To apply decision theory, one must first define the loss/cost 
function L(D, E) for the problem at hand, where D is the set of 
possible decisions and E is the set of true values of the entities one 
is attempting to infer. In general, these entities can be either con- 
tinuous parameters or discrete hypotheses, and so decision theory 
can be applied equally well both to parameter estimation and model 



selection. The loss function simply maps the 'mistakes' in our esti- 
mations/selections, D, into positive real values L{D, E), thereby 
defining the penalty one incurs when making wrong judgments. 
The Bayesian approach to decision theory is simply to minimise, 
with respect to D, the expected loss 



{L{D,E)) = J J L(D, E) Pr(D,E)dDdE. 



(3) 



2.2.1 Parameter estimation 

In the estimation of a set of continuous parameters 2 , the 'decisions' 
D are the parameter estimates © and the 'entities' E are the true 
values ©* of the parameters. Typically, the loss function is taken 
to be a function of the difference, or error, e = — 0* . 

Some popular choices of loss functions are: (i) the square error 
e 2 ; (ii) the absolute error |ej; and (iii) the uniform cost inside error 
bar, i.e. unity if |e| > A and zero if |e| < A, where A is some 
pre-defined small quantity. In each case, one can easily find the 
optimal estimator by minimising the expected loss (3) with respect 
to 0. The solutions are, respectively: (i) the posterior mean; (ii) the 
posterior median; and (iii) the posterior mode. 

The most popular choice of loss function among the astronom- 
ical community is the square error e 2 . When detecting astronomi- 
cal objects, however, the requirements are usually not those of the 
square error function, which puts an extreme emphasis on values 
very far from the true ones. This extreme sensitivity to the outliers 
makes the posterior mean estimator less robust than, for example, 
the posterior median, which is much more resilient to outliers. An 
even better choice would be not penalise the estimates at all if they 
fall within a small neighbourhood A around the true parameters 
values and prescribe a constant penalty otherwise. This is precisely 
the 'uniform cost inside error bar' loss function decribed above. 
This loss criterion closely matches what we would intuitively ex- 
pect when assessing the quality of a detection algorithm. For exam- 
ple, if the estimated value of a source flux lies outside the allowed 
range it does not matter how far it lies from the true value, since it 
will always be counted as a spurious detection (Planck Collabora- 
tion 2011, b). 

2.2.2 Interval estimation 

In addition to an estimate 0, one typically summarises the in- 
ference implied by the full posterior distribution by quoting ei- 
ther joint or marginalised confidence intervals (or, more precisely, 
Bayesian credible intervals). One could, in principle, obtain an op- 
timal interval by employing an approprite loss function, but a sim- 
pler approach is now widely accepted, namely the high probabil- 
ity density interval (HPD). The HPD interval containing the frac- 
tion (1 — a) of the total probability is defined such that: Pr(© G 
HPD|d, H) = 1 - a and, if ©i € HPD and 2 £ HPD, then 
Pr(®i\d,H) > Pr(© 2 |d, H). 

2 In astronomical object detection, the majority of the interesting param- 
eters (flux, position, geometry, etc.) are continuous and real valued. More- 
over, discrete parameters can always be handled within the same framework 
by resorting to delta Dirac functions. 

3 The posterior mean is the Bayesian optimal estimator under a very broad 
class of reasonable loss functions. When the posterior distribution is Gaus- 
sian all three common estimators match, and the posterior mode is often 
the simplest to compute. Nonetheless, if the parameter space is, in prac- 
tice, discrete (e.g. pixelisation), the posterior mean might provide a hyper- 
resolution estimate (sub-pixel accuracy). 
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The characterization of the HPD interval may be easily ob- 
tained by sampling from the posterior distribution. When the pos- 
terior distribution is known to be Gaussian or close to it, which is a 
very common case, the ±rms interval is usually quoted instead. 

2.2.3 Model selection and catalogue making 

In model selection, the decision theory 'entities' E are the hypothe- 
ses under consideration and the 'decisions' D are the chosen hy- 
potheses, such that L(Di,Hj) = Lij is the loss associated with 
the decision Di = choose Hi, when Hj is true. Thus, inserting 
this form for the loss matrix into the right hand side of equation (3) 
and performing the integration using the delta Dirac functions to 
represent discrete values as infinite densities, the average loss reads 



(L{D,H)) = Y,La Pr(A,H 3 



(4) 



If, for example, one is interested in distinguishing between 
a null hypothesis Ho and a given alternative hypothesis Hi from 
amongst a wider collection, then typically the loss function has the 
form 







Lij — 



positive value 
positive value 




ifi = j (no penalty if correct) 
if i = 1, j ^ i (false positive) 
if j = 1, i ^ j (false negative) 
otherwise (alternative selection error) 

(5) 

Minimizing (4) is not a difficult task (Van Trees 2001), but the 
general case above leads to long and cumbersome expressions that 
we shall not explore now. 

Much simpler and enlightening, but still capable of a very 
broad and interesting range of practical applications, is the binary 
case consisting of just two hypotheses Ho and Hi. In this case, the 
decision criterion that minimises the expected loss is 



In 



Pr(#i|d) 



«1 
Ho 



where £ = In ; 



_Pr(H \d)_ 
. The 'posterior odds' ratio 
Pr(ffi|d) Zi. Pr^) 



(6) 



(V) 



Pi(H \d) Z Pr(ffo) 

gives the posterior probabilities of the models given the data and 
is a very commonly-used quantity in Bayesian model selection. In- 
deed, Jaynes asserts that the best way to decide between two models 
is by computing the posterior odds and compare it against a thresh- 
old. Using DT we have recovered this result and, at the same time, 
given it a precise statistical meaning and also defined a threshold 
for decision making based on the loss criterion. 

Unfortunately, in astronomy it is often not possible to assign 
meaningful values to the loss. In particular, in object detection and 
catalogue making, astronomers like instead to measure the quality 
of a catalogue in terms of the expected/maximum contamination 
(false positive rate) and the expected/minimum completeness (true 
positive rate). There is, of course, a connection between this ap- 
proach and DT, but quantifying it is not trivial. Nonetheless, there a 
very simple and powerful way to to define the acceptance/rejection 
threshold in Bayesian catalogue making, based on the probabili- 
ties of the different errors that might occur (i.e. spurious or missed 
detections), but we shall postpone its discussion until Section 4. 

Before moving on, it is worth mentioning that, if one ignores 
the (often crucially important; (Riley et al. 2006, ch. 30, pag. 
1132)) factor Pr(//i)/ Pr(Tio) in (7), the remaining evidence ra- 
tio Z\ /Zo depends only on the data and can thus be viewed as an 



orthodox statistic. As such, the properties of its sampling distribu- 
ton can be investigated using standard frequentist tools, such as the 
'power' (true positive rate) Pr(Z)i|i/i) and the 'type I error rate' 
(false positive rate) Pr(Di|i/o) (Jenkins & Peacock 2011). Such 
analyses overlook, however, that the evidence ratio is the optimal 
decision rule. The only degree of freedom remaining is the choice 
of a threshold, which determines whether it is preferable to have 
fewer (more) detections at the cost of good (poor) rejection; there 
is no way of decreasing both error rates simultaneously because 
the evidence ratio is already the most discriminating statistic. The 
claim by Jenkins & Peacock (201 1) that the evidence ratio test is 
not 'powerful' results from them fixing the threshold in an arbitrary 
way; it is this threshold that controls the balance between different 
error rates, and not the statistic itself. A better way of measuring the 
quality of a binary classifier based on some statistic is to allow the 
threshold to vary and plot the resulting true positive rate against the 
false positive rate. This produces the receiver operating character- 
isic (ROC) curve of the classifier. A common measure of classifer 
quality is the area under the ROC-curve (the AUC statistic), which 
obviously does not rely on choosing a single threshold. One may 
show that the AUC is equal to the probability that the classifier will 
rank a randomly chosen data set generated from H\ higher than a 
randomly chosen data set generated from H . 



3 BAYESIAN OBJECT DETECTION 
3.1 Data model 

The specification of the PwS statistical model for a single- 
frequency observation of localised objects embedded in a back- 
ground is given in Carvalho et al. (2009). This can be straight- 
forwardly extended to accommodate multi-frequency observations. 
At each observing frequency v, PwS treats the observed data d v {x) 
as the superposition of a 'generalised' noise background n' v (x) = 
b v (x) + n v (x), consisting of background sky emission and instru- 
mental noise, plus a characteristic signal s v (a:) coming from the 
sources. For ease of notation, we will collect the fields at differ- 
ent frequencies into vectors. Moreover, the signal and background 
components in each frequency channel are assumed to have been 
smoothed with a known beam, which may differ between channels. 
The resulting model for the data vector d{x) reads 



d(x) = Sj(x; &j) + b(x) + n(x) 



(8) 



where iV s is the number of sources, Sj (x; &j) is the signal vector 
due to the jth source, which depends on the parameter vector 0j 
characterising the object, b(x) is the signal vector due to the diffuse 
astronomical backgrounds, and n(x) is the instrumental noise vec- 
tor. The astronomical backgrounds denoted by b(x) are expected 
to exhibit strong correlations between different frequency channels, 
whereas the instrumental noise n(x) is expected to be uncorrelated 
between frequency channels, and also between pixels in the case of 
simple white noise. 4 



4 The condition of the instrumental noise being white is not necessary. The 
general case of correlated noise between pixels does not complicate the 
mathematical development, but can increase computational expense. In any 
case, the assumption of white noise applies extremely well to Planck data 
on the small scales of interest used for the identification of localised objects. 
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We write the signal vector due to the jth source in (8) as 

a j (x;& j ) = A j f(<f> j )T(x-X j ;a j ), (9) 

where the vector r (x — Xj ; aj ) denotes the convolved spatial tem- 
plate at each frequency of a source centred at the position Xj and 
characterised by the shape parameter vector aj, the vector / con- 
tains the emission coefficients at each frequency, which depend 
on the emission law parameter vector <j>^ of the source (see be- 
low), and Aj is an overall amplitude for the source at some cho- 
sen reference frequency. Thus, the parameters to be determined for 
the jth source are its overall amplitude, position, shape parame- 
ters and emission law parameters, which we denote collectively by 
&j = {Aj, Xj, aj, 4>j}. The totality of these parameters, for all 
the sources present, plus the number of sources N s , are concate- 
nated into the single parameter vector 0. For convenience, we de- 
note the signal vector generated by all the sources by 



s{x;&) = ^s J {x;® J 



(10) 



3 = 1 



The nature of the emission law parameter vector depends 
on the class of object under consideration. PwS analyses the data 
assuming that all the objects belong to a single class, and repeats 
the analysis for each class of interest. The assignment of individ- 
ual sources to a particular class is then performed via a model 
selection step (see Section 4.5). The number and specification of 
classes can be arbitrary, including, for example, SZ clusters, point 
sources, Galactic objects, etc. Previous multi-frequency versions of 
PwS have been limited to the case where all objects share the same, 
fixed emission law. SZ clusters fall exactly in this category as, ig- 
noring the relativistic corrections, they all follow exactly the same 
spectral signature (Birkinshaw 1999), which does not depend on 
any parameters. For extragalactic point sources, however, the emis- 
sion law is phenomenological and can vary between sources. Con- 
sequently, PwSII has been extended to accommodate such cases. 
For example, two important families of extragalactic point sources 
in Planck data are as follows. 

• Radio sources are the dominant family of point sources for all 
Planck channels up to and including 143 GHz. Based on the work 
of Waldram et al. (2007), we assume an emission law for such 
objects of the form 



In/, 



a In ( — 



ln[- 



(11) 



where <p — {a, (3} are spectral parameters that can vary between 
sources, and v is the reference frequency (note that /„ = 1 at 
v — vq). Setting /3 = 0, recovers the commonly-assumed power- 
law spectral behaviour with spectral index a. The more general 
form (11) accommodates most of the common types of radio- 
source spectra, namely: flat, steep, and inverted. 

• Dusty galaxies dominate the Planck highest frequency chan- 
nels, starting at 217 GHz up to 857 GHz. Their spectral behaviour 
may be represented to very good accuracy using the well-known 
greybody model 

BAT)' 



In/, 



m|i! 

v 



+ ln 



B V0 {T) 



(12) 



where the spectral parameters <p — {/3, T} are the dust emissivity 
and temperature respectively, B V (T) is the Planck law of black- 
body radiation and i>o is once again the reference frequency (Ser- 
jeant & Harrison 2005). We have again normalised (12) such that 
f v = 1 at v = v . 



3.2 Likelihood 

The form of the likelihood is determined by the statistical prop- 
erties of the generalised noise (background sky emission plus in- 
strumental noise) in each frequency channel. As in PwSI, we will 
perform our analysis in sufficiently small patches of sky such that it 
is not unreasonable to assume statistical homogeneity. In this case, 
it is more convenient to work in Fourier space, since there are no 
correlations between the Fourier modes of the generalised noise, 
which leads to considerable savings in computation and storage. 
Moreover, we will assume that both the background emission and 
instrumental noise are Gaussian random fields. This is a very ac- 
curate assumption for instrumental noise and the primordial CMB, 
but more questionable for Galactic emission. 

We are, in fact, interested only in the likelihood ratio between 
the hypothesis H s that objects (of a given source type s) are present 
and the null hypothesis Ho that there are no such objects. The latter 
corresponds to setting the sources signal s(x; ©) to zero. Under 
our combined assumptions, the log-likelihood ratio has the form 



In 



£if 8 (0) 



LW»)J 



■n 

-iXVfoejAT^Jafoe), (13) 



where the tilde denotes a Fourier transform, the usual mode 
wavenumber k = 2nri, and the matrix J\f(r)) contains the gen- 
eralised noise cross-power-spectra. 

From (9) and (10), the Fourier transform of the signal due to 
all the sources may be written 



s{r r ,&) = B{r 1 )Y j Ajf{<j, j )T{-r 1 ;aj)e 



(14) 



where the vector B(rf) contains the Fourier transform of the beam 
at each frequency and t(tj; a) is the Fourier transform of the tem- 
plate for an unconvolved object at the origin, characterised by the 
shape parameters a. 

Substituting (14) into (13) and rearranging, one obtains the 
final form for the likelihood ratio, which we will use throughout, 
namely 



In 



I A^- 1 [Vj(r,)r(- V ; aj )] x - \A) £ Qjj(r,)\r(v; <*i 



- g [AiAjJ- 1 [Qij( V )r( V ; a^-rr, aj)] Xi _ x . } , (15) 

i>j 

where J r ~ 1 [- . .]« denotes the inverse Fourier transform of the 
quantity in brackets, evaluated at the point x, and we have de- 
fined the quantities Vj(rj) = d (ri)Af^ 1 (r])ip(r]) and Qij(rj) = 
i/> i (r7)A/ 1 irfjtp ^Ct)' in which the vector ipi(r]) has the compo- 
nents (f/'J, = B v (r))(f i ) l ,, with v labelling frequency channels. 

We have written the likelihood ratio in this way since it com- 
bines multi-channel data into a single equivalent channel. More- 
over, it highlights the importance of the final 'cross-term' on the 
RHS of (15). Let us assume for a moment that this cross-term is 
negligible. In this case, the parameters of each source enter the 
likelihood independently. This parameter independence allows us 
to perform our analysis one source at a time and forms the basis of 
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the 'single source model' discussed in Section 4.1, which greatly 
simplifies the source detection problem. The physical meaning of 
the neglected cross-term is most easily understood by consider- 
ing the simple, but important, example of point sources, for which 
t(x, a) = 5(x). In this case, the cross-term in (15) becomes 



X)A i A J -^ 1 [Qy(n)]x 4 



(16) 



A sufficient condition for this expression be small is that all the 
sources are sufficiently well-separated that [Qij(r))] x is close 
to zero for such distances. For simple, uncorrelated backgrounds, 
Qij (tj) contains just linear combinations of the instrument beams 
in each frequency channel. Thus, the condition that (16) is small is 
just a generalization of the common assumption in astronomy that 
objects are well separated, or that object blending effects are negli- 
gible 5 . When detecting point sources, and assuming the blending is 
not severe, an efficient implementation of the full deblending term 
is possible, but this will be addressed in a forthcoming publication. 

It is worth noting that maximising the likelihood ratio (15), 
in the absence of the cross-term (16), with respect to the source 
amplitudes Aj, gives 



F- 1 [Vj{v)T{-n;a k 



(17) 



which recovers the expression for the matched multi-filter (MMF) 
(Herranz et al. 2002). Thus, we see that the filtered field is merely 
the projection of the likelihood manifold onto the subspace of posi- 
tion parameters Xj . This identification further allows one straight- 
forwardly to estimate the uncertainties on all the MMF parameter 
estimates simultaneously by calculating and inverting the Hessian 
matrix of the likelihood at its peak(s). This should be contrasted 
with traditional approaches to MMF in which the uncertainty on 
the estmated source flux is calculated assuming the values of all 
other parameters are fixed (Melin et al. 2006). 

Moreover, substituting the maximum-likelihood estimate (17) 
into the expression (15) for the likelihood ratio, one obtains for the 
jth object 



|^Q^(r7)|r(r 7 ;a j )| 2 ^ = iSNR 



(18) 

where SNRj is the signal-to-noise ratio (at the peak) of the jth 
source, and the rms a of the noise satisfies 



(19) 



Thus, one sees that in the traditional approach to catalogue mak- 
ing, in which one compares the maximum SNR of the putative de- 
tections to some threshold, one is really performing a generalised 
likelihood ratio test. 



3.3 Priors 

If the data model provides a good description of the observed data 
and the signal-to-noise ratio is high, then the likelihood will be very 
strongly peaked around the true parameter values and the prior will 

5 When the background is uncorrelated, this condition is immediately ful- 
filled if each pixel contains signal coming from one and only one source. 
However, this is not the case when there are strong correlations in the back- 
ground as in the case of Planck. 



have little or no influence on the posterior distribution. At the faint 
end of the source population, however, priors will inevitably play 
an important role. Moreover, since for most cases in astronomy the 
faint tail overwhelmingly dominates the population, the selection 
of the priors becomes important and has to be addressed very care- 
fully. 

PwSII separates the tasks of source detection (deciding 
whether a certain signal is due to a source) and source estimation 
(determining the parameters of the source). This separation has the 
advantage of allowing the use of different sets of priors at each 
stage. Typically, we first perform the source detection step using 
'informative' priors, which encompass all the available informa- 
tion, since they provide the optimal selection criterion and the op- 
timal estimators. After the set of detections has been decided, PwS 
proceeds to the estimation pass, in which 'non-informative' priors 
may be used instead. 

Non-informative priors are constructed such that the max- 
imum a posteriori (MAP) estimator of any quantity should de- 
pend exclusively on the data. 6 One way of expressing this condi- 
tion is that, when changing the data, the likelihood shape remains 
unchanged and only its location in the parameter space changes 
(Box & Tiao 1992). Thus, the idea is to find an appropriate re- 
parametrization of the likelihood that transforms the parameters 
into location parameters, for which the ignorance prior is locally 
uniform (locally, in this sense, means the parameter range where 
the mass of the likelihood is concentrated). One then performs the 
inverse parametrisation transformation on the uniform prior to ob- 
tain the appropriate prior in the original parameterisation. Finding 
such a transformation can, however, be very difficult for a general 
multi-dimensional prior. 

Nonetheless, in a large majority of applications, the parame- 
ters be may assumed independent, so that the prior factorises 



h,e 2 ...,e n ) =7ri(0l)7T2(02). 



i(0n 



(20) 



For one-dimensional distributions, Jeffreys devised a general way 
to derive the non-informative prior on a parameter based on in- 
variance properties of the likelihood under a change of variable. 
The Jeffreys rule for constructing ignorance priors for the one- 
dimensional case reads 



7T(0)oc7 1/2 (0), 



where 



d 2 In £(6) 
<9<9 2 



(21) 



(22) 



is the Fisher information. We will adopt this approach and now 
consider the prior on each parameter of interest. 



3.3.1 Prior on positions 

It is obvious that the distribution of sources is not uniform across 
the sky. The Galactic regions (Milky Way and Magellanic Clouds) 
have a much higher density of detectable sources than the rest of 
the sky. Moreover, assuming extra-galactic sources to be uniformly 
distributed across the sky (no clustering) is not sufficient to en- 
sure that the distribution of detectable sources is uniform, since the 
background/noise is itself inhomogeneous over the sky. 

6 These priors usually need not be properly normalised, since one wishes 
only to locate the maximum of the posterior distribution and the normalisa- 
tion does not depend on any parameters. 
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Nonetheless, PwS divides the sky into small patches and, in 
each such region, the assumptions of background homogeneity and 
a uniform source distribution are reasonable. Moreover, if the sky 
patches used are sufficiently small, our locally uniform model can 
easily cope with clustering when the gradient of the density of 
sources is small across the patch boundaries. The correctly nor- 
malised positions prior for the complete ensemble of sources in a 
patch is simply 

1 



Pr(X JVs |iV s , iV pix ) 



1 v piX 



(23) 



where JV p ix is the number of pixels in each patch and iV s is the 
number of sources in that patch. 



3.3.2 Prior on the number of sources 

Following the same rationale of local uniformity, i.e no clustering, 
the probability of finding N s objects (above a given flux limit) in a 
sky patch follows a Poisson distribution 



^(7V s ) = Pr(iV s lA)= e - A — . 



(24) 



where A is the expected number of such objects in that region. 
Moreover, A should be proportional to the region size A = 
A s iVpixAp, where A s is the number of sources per pixel and A p is 
the pixel area. Note that A s may change across the sky as we are 
only enforcing the uniformity locally within each patch. 



3.3.3 Prior on flux 

A good flux estimator should be unbiased, but this goal is often 
problematic. The optimal estimators in the sense of decision the- 
ory, i.e. those that minimise the expected loss/cost, are most often 
biased and they combine the data with external information from 
ancillary data sets. PwSII thus includes two different sets of flux 
priors with distinct goals. 

• Non-informative. Our data model depends linearly on the 
source fluxes Aj and is a particular case of the general linear model 
(Box & Tiao 1992). Considering only a single source for simplic- 
ity (the solution for multiple sources is a mere repetition of this 
simpler case), one may show that the likelihood can be written in a 
form that makes it clear that the flux is in fact a location parameter: 



£-H.(Aj) oc exp 



E„ Qn(v)\r(v,a,i) 



(25) 

where Aj is the MMF estimate of the flux (17). The same result 
could have been obtained directly using formula (21). Thus, the 
prior on the flux must be locally uniform: 



n(Aj) oc c, 



(26) 



where j indexes the source. For a more general and rigorous treat- 
ment see Box & Tiao (1992). 

• Informative. Owing to the different statistical properties of 
point sources and SZ galaxy clusters, a different prior applies in 
each case. For point sources, we adopt the flux prior first suggested 
by Argiieso et al. (201 1), 



■K(Aj) = Pr(A j \A oP 'y) oc 



1 + 



(27) 



exponent controlling the shape of the power law for fluxes much 
larger than the 'knee'. This provides a good model for the observed 
distribution of fluxes, fitting the de Zotti model almost perfectly (de 
Zotti et al. 2005). Moreover, the distribution can be properly nor- 
malised as required for evidence evaluation. PwS truncates the dis- 
tribution faint tail and re-normalizes the remaining range as result 
of the early selection effect (see 3.3.6), a practice the proponents 
of the distribution also followed. For galaxy clusters, the derivation 
of the prior follows a different approach. The Planck Sky Model 
(PSM vl.6) (Planck Sky Model 2007) was used to draw realistic 
simulations of the cluster populations assuming a standard WMAP 
best-fit ACDM cosmology (Hinshaw et al. 2009) and the Jenkins 
mass function (Jenkins et al. 2001). We found that the fluxes in the 
sample cluster catalogues were quite well fitted by a power law: 



Tv(Aj) oc A] 



(28) 



To deal with the early selection threshold and to provide a properly 
normalised distribution, once again a minimum and, this time, a 
maximum flux also were assumed. 



3.3.4 Prior on size 

• Point sources. Point sources are best modelled by imposing 
the prior n(r) = 8(r) on the 'radius'. This condition might, how- 
ever, be too restrictive, since to simplify the implementation of the 
code and to make it faster, PwS assumes the instrument beams are 
circularly symmetric, which is only an approximation to the true 
beam shapes. Thus, even for point sources, allowing the source ra- 
dius to vary over a small range of values allows a better fit between 
the template and the pixel intensities and consequently a higher 
likelihood ratio/SNR value. Thus, in both the informative and non- 
informative case, our preferred radius prior for point sources is 



1/A 




rj < A 
rj > A 



(29) 



where A <JC FWHM (the full-width-half-maximum of the beam). 

• Galaxy clusters. Turning to galaxy clusters, a significant frac- 
tion of the clusters Planck will detect will be unresolved, and 
thus appear as point sources with a distinctive spectral signature. 
In many cases, however, galaxy clusters are large enough to be 
mapped as extended objects and a parameter controlling the scale 
of the cluster profile, the radius, needs to be included. The informa- 
tive prior on the radius was derived using the same procedure as in 
section 3.3.3 and an exponential law 



ir(rj) oc exp (~y) 



(30) 



was found to fit the simulated catalogues very well. We truncate the 
distribution outside a minimum and maximum radius. 

The non-informative prior follows a different law from that ex- 
pected from the cosmological models. Our model for an individual 
source is the convolution of the source profile with the beam PSF 
The radius parameter r' s that scales the resulting shape is a 'hy- 
brid' parameter, as it shifts and scales the likelihood at the same 
time (Jaynes, ch. 12). After applying the Jeffreys rule, the non- 
informative prior on r' s reads: 



7r(r' s ) oc -j. 



(3D 



Assuming that either the profile or the beam have centroids at the 
origin and the profile is a scaling profile r(r/r 3 ) then, 



where Ao is the 'knee' flux, p is some positive number and 7 is the 



^B 2 + k 2 r 2 , 



(32) 
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where B 2 is a constant known as the function variance of the beam 
(Bracewell 1965) and k 2 is another dimensionless constant, the 
variance of the dimensionless variable r/r s over the profile. The 
non-informative prior for the radius parameter then reads: 



7r(r) oc 



(B 2 + r 2 ) 3 



(33) 



where B = B/k. For the general case B 2 , the variance of the beam, 
should be replaced by the variance of y/Vj(r)). For unresolved 
objects, narrow clusters with radii smaller than the beam size, the 
prior grows linearly with r. For well resolved objects, r 3> B, the 
prior decreases proportionally to r~ 2 . 



3.3.5 Prior on spectral parameters 

There is an extensive literature on the distribution laws of radio 
source spectral indexes : de Zotti et al. (2010), (Planck Collabo- 
ration 2011, d,e). In general Gaussian distributions, or Gaussian 
mixtures with two modes, fit the available data reasonably well. 
However, the most interesting sources are exactly those that do 
not follow the canonical laws of emission. To avoid narrowing the 
range of possible alternatives too much, uniform priors are proba- 
bly better choices unless we choose to target a very specific family. 
The same holds for dusty galaxies. 

By applying our standard procedure, the non-informative prior 
on the spectral parameters reads 



n(aj) oc 



E 



dS v (cij 



(34) 



where the sum extends over all frequency channels. 



3.3.6 Prior on the models 

The prior ratio Pr(i3i)/ Pr(iio) on the models is often neglected 
(i.e. assumed to equal unity), but plays a very important role in 
the PwS detection criterion. To give a proper account of its nature, 
let us imagine the simplest possible detection problem, where we 
know in advance all the true values of the parameters that define an 
object, which translates into delta-functions priors. Substituting this 
condition into (6) and making use of (18), we obtain the following 
inequality: 



SNR 



H 



(35) 



One may interpret the term In ( j^Jf^) ) as an extra 'barrier' added 
to the detection threshold because we are expecting more fake ob- 
jects than the objects of interest, due to background fluctuations. 

We saw earlier that, when an object is present, a local maxi- 
mum in the likelihood is always present in the position parameter 
sub-space. This condition immediately implies that only likelihood 
maxima need be analysed. Nonetheless, one expects other likeli- 
hood maxima to occur as a result of background fluctuation 'con- 
spiracies'. Assuming Poisson statistics for the number of sources 
and the number of likelihood maxima resulting from the back- 
ground fluctuations, then the ratio of the probabilities is given by: 



Pr(ffi|JV„) 
Pr(Ho\N s ) 



(36) 



where Ao is the expected number of maxima per unit area result- 
ing from background fluctuations above the minimum limit of de- 
tection of the experiment, and Ai the expected number density of 
sources above the same limit. 

If only background is present, the density of maxima, Ao, re- 
sulting from the filtering procedure that creates the likelihood man- 
ifold can be estimated using the 2D Rice formula: 



n b (v, k, e) 



8\/3n;, 
tt\/1 - P 



: e(K 2 — 4e 2 ) e 



sd-f 2 ) , (37) 



where v = A/ a is the 'normalised peak amplitude', k the 'nor- 
malised curvature', e the 'normalised shear', and p = <T 2 /(aoC72), 
with a 2 n = (27r) 1+2n / o °° ri 1+2n \V(v)\ 2 d V (Lopez-Caniego et al. 
2005). Marginalizing over all parameters we obtain the expected 
density of maxima of a Gaussian filtered field, which reads 



n b = 



8TiV3a 2 ' 



(38) 



One is not interested, however, in all peaks, but only on those 
above a certain level vq, since PwS pre-selects the putative detec- 
tions by imposing a minimum SNR level before attempting the evi- 
dence evaluation. The main reason for adopting this early selection 
is computational efficiency. The SNR alone provides a good proxy 
(see formula 18) for deciding whether a candidate peak is the result 
of the presence of a source or just a background fluctuation. More- 
over, low SNR peaks tend to be 'badly-shaped' making the sampler 
very inefficient and resulting in a very large fraction of the samples 
being rejected. To make the things even worse, in most cases, these 
peaks themselves end up being rejected as objects. 

The applied flux cut must be taken into consideration to eval- 
uate the correct expected number counts, which define the prior 
Pr(iii) as well. Thus, A will read: 



/"OC 

Ao = / n b (v)dv, 



(39) 



where nh(y) is given by: 



+ 



(l + erf e~4 (v 2 - l)p 2 Pl + 



— 



-pp\ 



(40) 

where p\ = \J1 (1 — p 2 ) and p2 = \jl (§ — p 2 ). The expected 
number count of targeted objects above a certain flux threshold 
S, Ai = (N(> S)), may be easily derived from their differential 
counts. 

Now a distinction must be made because the dominant type 
of extra-galactic point sources in Planck maps are galaxies which, 
in principle, do not follow the same statistics as the galaxy clus- 
ters. From general cosmological assumptions it is possible to derive 
that the expected differential counts for a certain population type of 
galaxies per flux interval at a certain frequency always follow a 
power law: dN^ /dS = A$ S~ b (de Zotti et al. 2005). For clusters 
of galaxies, however, one must instead use a realistic set of sim- 
ulations, such as the 'Planck Sky Model' (PSM vl.6) (Planck Sky 
Model 2007). Using a properly normalised mass-function (Jenk- 
ins et al. 2001), one finds that a power law also fits quite well the 
expect number counts of clusters above a certain threshold. So, in 
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either case, point sources or clusters, Ai may be written as: 

POO J AT 

Ai = N(> So) = J -^dS = A* (1 - by 1 Slr b , b ± 1, 

(41) 

where we keep the parameters {A^, b} free. These parameters are 
usually provided by the user to target a specific type of object 
and/or instrumental setup. 



4 OBJECT DETECTION STRATEGY 

So far we have only developed the logic and probabilistic underpin- 
nings of PwS. It is now time to bring all the pieces together into a 
consistent strategy for the detection and characterisation of discrete 
objects. Our aim is to construct a robust, controlled, and predictable 
algorithm. Some caveats will be identified and solutions suggested, 
always justified within the framework presented above. 



4.1 The single object approach 

Let us return to formula (6). At a first look, the evaluation of (6) 
seems quite a daunting task. In order to apply the full Bayesian 
approach, many complex integrals, over a very high dimensional 
volume (at least 4 x N s ), need to be evaluated. 7 . Clearly a brute 
force method is not efficient and perhaps not possible, even with 
the massive computing resources generally available. 

To find an effective solution, we begin by making two impor- 
tant assumptions: (i) the objects of interest are 'well separated', so 
that (16) holds; and (ii) all variables pertaining to each individual 
source are mutually independent, which has already been implicity 
assumed throughout the exposition of our inferencial infrastructure. 

These conditions allow us to separate the integrals associated 
with each source. This is a very important simplification because 
it is now possible to deal with each source independently, one at 
a time. This is the 'single object approach' (Hobson & McLachlan 
2003) and replaces a single -/V param x N s -dimensional integral with 
a sequence of N s integrals, each of dimension iVparam. 

The complete likelihood expression may now be replaced by 
the much simpler 'single source' form. Although, we should ex- 
ercise some care in defining the limits of integration in position 
space, since no significant likelihood mass can be shared among 
position integration domains. Apparently, this requirement creates 
such a wealth of complexity to the integral evaluation that the sin- 
gle source approach might at first be considered a poor choice. For- 
tunately, the method PwS uses to evaluate the evidence integrals 
automatically enforces this rule if the fields are not too crowded 
(see section 4.2). 

Under our assumptions, the odds of the model Hi (for a given 
source type), given N s such sources, reads 



Taking logarithms and rearranging, one finds 



Pr {Hi\d,N, 



Pr(H \d,N s ) y ^ v > N s \ VAo 



2 = 1 

(42) 

where we have defined the 'partial evidence' for each individual 
source as 



Zij 



i)d&j. 



In 



Pr(fli|rf,jV s 



= £ M-Zii) - N.P., (44) 

2 = 1 



_Pr(H \d,N, 
where we have defined the 'penalty per source' P 3 as 



P s = In A s 1 + In 



+ _[Ai+lnJV s !]. 



(45) 



Thus, the total ln(odds) for a single patch is the sum of the partial 
In (evidence) for each source, plus an extra global penalty term that 
contributes, in the majority of the cases, negatively to the final bal- 
ance and does not depend on any particular source, but exclusively 
on the ensemble properties. 

The most robust source catalogue is that which maximises the 
ln(odds) in (44), but we do not know the value N s . Moreover, we 
have not yet addressed how many or which aspirant detections will 
be finally selected for inclusion in the catalogue. Nonetheless, the 
expression (44) is a sum, so its maximum value is reached when 
only the positive terms are included. Thus, one possible procedure 
to select the optimal set of sources is as follow: 

(i) evaluate each partial ln(evidence); 

(ii) sort them in descending order; 

(iii) to each line add the global P s term, with N s — 1 for the 
first line, N s = 2 for the next line, etc. 

(iv) starting from the line with the lowest partial ln(evidence), 
move up throwing away all lines for which the sum of the above 
contributions is negative. 

The 'proto-catalogue' is now made and N s found. 

We are not finished yet, however, because we have only se- 
lected the set of detections that maximises the odds. Other con- 
straints may yet apply. For instance, we may impose a threshold 
per line different from zero as result of the loss criteria or, as we 
shall see, a prescribed contamination for the catalogue. Moreover, 
the P s terms added to each line have different values, but only their 
sum has a well-defined meaning, which applies to the full cata- 
logue taken as a whole. Therefore, we next average the P s over the 
proto-catalogue and add it to each partial ln(evidence) to obtain the 
ln(odds) for each object 



ln(odds)j 



Pr(#i|d) 



Pr( J ff„|d) 



= ln(^) + (P s ). (46) 



(43) 



This quantity has a pivotal role in catalogue making (see section 
4.6). 



4.2 Evaluation of the odds ratio 

Even using the simplified form of the likelihood assumed in the 
single-object approach, a 'brute force' evaluation of the result- 
ing evidence integrals is still not feasible. One must instead use 
a Monte Carlo approach to the numerical integration. Evidence 
integrals are usually evaluated using Markov Chain Monte Carlo 
(MCMC) methods and thermodynamic integration. Such methods 
can fail, however, when the posterior distribution is very complex, 
possessing multiple narrow modes 8 that are widely separated. We 
therefore instead use 'nested sampling' (Sivia & Skiling 2006), 
which is much more efficient, although not without its difficulties. 



7 Even when working with one small patch at a time, seldom N s is smaller 8 At least one central maximum per source plus other secondary maxima 

than 4. around the central higher peaks (Carvalho et al. 2009). 
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Feroz et al. (2009) have developed a very efficient implementa- 
tion of the nested sampling algorithm, called 'MultiNest', which 
is capable of exploring high-dimensional multimodal posteriors. 
Nonetheless, MultiNest is designed to be a general sampling and 
evidence evaluation tool and it is not particularly tuned for Planck. 

In the interest of speed, PwS instead tries to take full advan- 
tage of the properties of the astronomical data sets. As already 
stated (see section 3.3), if our model explains the data well then 
the likelihood should peak steeply around the parameter true val- 
ues, decay very rapidly to zero, and have most of its mass con- 
centrated around the maxima vicinities. Thus, if one can first find 
the likelihood maxima, then one does not need a sophisticated 
multimodal sampling algorithm like MultiNest. A much simpler 
nested sampling scheme such as Mukherjee et al. (2006) would 
perform equally well. Moreover, reasonably high SNR maxima de- 
velop 'well-shaped' peaks, in the sense they are close to Gaussian, 
rendering the sampling highly efficient. Two other significant ad- 
vantages are: (i) we can reduce our data set to a small neighbour- 
hood enclosing the maxima, so that only a very small number of 
pixels close to the maxima contribute appreciably to the evidence 
value; and (ii) a much reduced parameter volume allows the same 
number of 'live points' to deliver a considerably higher accuracy on 
the evidence value, since they do not split among the several pos- 
terior peaks. This is the approach adopted in PwS, which we now 
outline in more detail. 

4.2.1 Locating the likelihood maxima 

Our first goal is to find the likelihood maxima. For illustra- 
tion, let us focus on the example of galaxy clusters, each of 
which is described by 4 parameters: {X, Y, S, R}. An efficient 4- 
dimensional minimiser implementation is straightforward and im- 
mediately available (Press et al. 2007). However, our manifold 
has many maxima and we need to check all of them, otherwise 
we might lose some sources. 

One possibility would be to follow the approach used in PwS 
I, where the Brent line minimiser was 'enhanced' with an ancillary 
step to allow it to 'tunnel' from one minimum to the next one using 
a scheme closely related with the equivalent quantum mechanical 
effect. To increase the effectiveness of the procedure, PwS I started 
a Powell minimization chain (hence the name 'PowellSnakes') in 
many different locations of the manifold in an attempt to find all 
the maxima. It should be remembered, however, that the likelihood 
only exhibits multiple maxima in the position sub-space; the other 
sub-spaces are 'well behaved'. Moreover, the likelihood in the po- 
sition sub-space is merely the MMF filtered field. We therefore in- 
stead use a brute force peak finding algorithm that scans all pixels 
in this subspace, which is very easy to implement and almost in- 
stantaneous. Then, after collecting a list of peak positions, we start 
a 4-dimensional PowellSnakes optimisation at each such location to 
find the maximum-likelihood parameters for that particular peak. 

A subtlety does arise in this approach, however, since to ob- 
tain the MMF filtered field, one needs to assume a size R for the 
objects to define the filter. Since we expect different clusters to 
have different radii, we might lose some peaks because of the mis- 
match between the true value of the cluster radius and that used in 
the filtering template. A simple solution would be that suggested 
by the MMF authors: apply the filter repeatedly using a differ- 
ent radius each time. Although practical, this is, however, not the 
most efficient approach. Fortunately, if the instrument beams and 
the sources possess reflection symmetries in both axes, then one 
can show that the Fisher matrix at each likelihood peak is block- 



diagonal (assuming the likelihood (15) and using the single-source 
approach assumption (16)), such that there is no correlation be- 
tween the position subspace and the other parameters (flux and 
size) of the cluster. This has two important consequences: (i) re- 
gardless of the radius used to construct the filter, a likelihood peak 
will always be present at the location source and its position will not 
change positions as the filter scale varies; (ii) we do not need to per- 
form a full 4-dimensional maximization but can (at least) separate 
the position variables from all others, which brings a tremendous 
simplification to the problem of finding the likelihood maxima. 
Thus, we can indeed start by finding the maxima in the position 
subspace using a brute force 'check-all-pixels' approach and then, 
after pinpointing the position of the source, search the remaining 
sub-spaces associated with the other variables. 

A couple of final comments on this approach are worth mak- 
ing. First, it is well known that matched filters are excellent at find- 
ing and locating sources, but not as good at estimating fluxes. If 
the beam shape/size is not completely known but symmetric, even 
when building up a filter with the wrong beam geometry, the filter 
will correctly recover the positions of the objects. In general, how- 
ever, the element in the Fisher matrix corresponding to the correla- 
tion between the radius and the flux of an object is non-zero. There- 
fore, if the filter is assembled using wrong beam parameters, bias 
in the flux estimates must be expected. Second, and perhaps more 
subtle, is that the symmetries of the Fisher matrix only hold on 
average. Thus, for each individual peak some residual correlation 
between the position and the other variables is expected. According 
to our current accumulated experience, however, this correlation is 
usually very small. Nonetheless, PwS still includes the option to 
use the peaks positions obtained from the MMF filtered fields just 
as initial hints for a full N-dimensional Powell minimisation. 

4.2.2 Exploring the posterior distribution 

Our initial step provides the ML estimates and the SNR of each de- 
tection candidates. This has a very useful side effect, since we do 
not need to explore the posterior distribution around all the max- 
ima we find. Only a much smaller sub-set is chosen based on an 
SNR threshold. This SNR threshold should be low enough not to 
reject any substantial fraction of peaks associated with true detec- 
tions and high enough to make the selected sample contain a large 
percentage of true sources and to include most 'well-shaped' max- 
ima. This shorter list is then sorted in descending order of SNR 
and one-by-one the maxima are sent to the nested sampler, which 
returns an evidence estimate and a set of weighted samples that 
we use to model the full joint posterior distribution. From these 
samples we can compute any parameter estimate, draw joint dis- 
tribution surfaces, predict HPDs intervals of any content over the 
marginalized distributions to infer the parameter uncertainties, etc., 
as in the example presented in (Planck Collaboration 201 1, c - fig. 
9). The current released implementation of PwS (v3.6) computes 
the maximum-likelihood, the expected value over the posterior es- 
timates and la error bars. 9 

4.3 Non-gaussianity of the background 

It is clear that our model of the observations, like any model, is only 
an approximation to the real data. This is true both for our model of 

9 The maximum-likelihood estimates from the maximization step are up- 
dated, if necessary, during the sampling phase. 
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the discrete objects and for our model of the background. For the 
latter, it is clear that the background emission in real observations 
is neither Gaussian nor statistically homogeneous. Regarding non- 
Gaussianity, we do not mean that of a primordial origin, which, if 
exists, would have an insignificant effect in our analysis. We are 
instead alluding to the non-Gaussianity induced by the Galactic 
emission components, the confusion noise created by the sources 
below the detection threshold, the instrumental noise artefacts com- 
ing from the incomplete removal of the cosmic rays glitches and, 
of course, a wealth of other possible sources. 

Many authors simply ignore this issue and many others dis- 
miss its importance. A very strong argument, used many times, is 
that despite the sky emission being admittedly non-Gaussian, the 
effect of the finite PSF of beams will combine many different sky 
locations into a single pixel. In addition, signal de-noising proce- 
dures further combine more samples. Some authors then appeal to 
the Central Limit Theorem (CLT) to claim that non-Gaussian ef- 
fects in the final data must be completely negligible. 

This argument seems particularly appealing, but a deeper anal- 
ysis of the CLT shows that, in our particular problem, namely de- 
tection and separation of two signals, the effects of the CLT are not 
as important as those authors claim. Formally, the CLT only ap- 
plies when N — > oo, where N is the number of random deviates 
in the sum. For finite N, the CLT only guarantees the Gaussian ap- 
proximation is good for 'a region around the mode' (Bouchaud & 
Potters 2009). The size of this Gaussian region grows very slowly. 
In the worst case, the distributions of the individual deviates are 
skewed and have 'fat tails'. Let us focus on a real example: the 
Galactic emission. If the spectral brightness distribution follows a 
power law with a finite second moment, to guarantee the field has 
physical behaviour (= finite energy), the normalised central Gaus- 
sian region, |it|, only grows very slowly with N: 



\u\ < M oc VlnN 



(47) 



where uo is the tail lower boundary. This means that the sum must 
have more than 1000 terms to make the Gaussian approximation 
acceptable up to about \u\ ~ 2.6. In detection problems, where 
we want to separate the maxima created by the sources from the 
background fluctuations, we are dealing all the time with the back- 
ground distribution upper tail: 



POO 

> = / Pr(u) . 

Jun 



(48) 



If the background field intensity distribution follows a power law: 
Pr(/„) oc with /i > 2, to guarantee its energy is finite, then 

the probability that a sum of N deviates falls into the upper tail 
region of the sum normalised distribution is: 



V> oc 



(49) 



"0 ATM/2-l m M/2 N - 

This is a very serious problem. Object detection methodologies are 
designed typically to suppress the background and amplify what 
does not fit its model. The non-gaussianity component is not part 
of our background model, so its effect on the detection process is 
doubly pernicious: not only it is not removed, it is amplified. 

There seem to be only two ways of circumventing this prob- 
lem: (i) to include the non-Gaussian effects in the statistical models; 
and (ii) to manipulate and add as much data as possible to make it 
more Gaussian. Owing to the complexity of Planck data it is al- 
most impossible to give a proper account of the non-Gaussian ef- 
fects without making the problem unsolvable. So, a workable solu- 
tion must necessarily combine as much data as possible, and then 



analyse the outcome. The only possible way of doing this is to use 
multi-channel analysis all the time. 

Our own experience corroborates this view. The SNR values 
of the PwS selected detections and the thresholds the frequenstist 
methods normally employed (> 4.0), are much higher than what 
would be expected according to the purity levels of the catalogues 
if the statistics were purely Gaussian. Although, the channels with 
the largest beams, where each pixel is the result of a much higher 
number of different contributions, do indeed have detection thresh- 
olds lower and closer to those expected from the Gaussian theory. 
A good practical example of how the multi-channel processing can 
help the reduction of the impact of the non-Gaussian distributions 
on the detection process is the recovery of the SZ signal (Melin et 
al. 2011). 

Owing to the residual non-Gaussianity left in the background, 
especially close to the Galactic plane, we should now expect a 
higher number of background fluctuations reaching above the evi- 
dence threshold level than those predicted by the Gaussian model. 
So, eventually, we need to correct the prior on the models: , 
as this prior was derived assuming that the background had purely 
Gaussian statistics. The simplest way, we believe, is just to count 
the total number of fluctuations above the SNR threshold adopted, 
before embarking on the evaluation of the evidence. In particular, 
one should compare this number with what would be expected from 
the Gaussian model plus the predicted source counts above the SNR 
threshold, then take the larger quantity. Denoting this value by T, a 
corrected estimate of Ao (see formula 36) would read 



Ao~T-Ai. 



(50) 



This very simple 'trick' provides a first-order correction to the ef- 
fects of background non-Gaussianity. 

4.4 Statistical inhomogeneity of the background 

Real observations will also inevitably exhibit some statistical in- 
homogeneity of the background, in contradiction to our assumed 
model. Consequently, the conditions of optimality derived there- 
from no longer hold. This can lead to a number of difficulties in de- 
tecting and characterising discrete objects, particularly in regions 
of the sky that contain bright, very inhomogeneous and anisotropic 
backgrounds. Indeed, this general expectation has been borne out in 
applying earlier versions of PwS to detailed simulations of Planck 
observations (PSM 1.6, Planck Sky Model 2007). In particular, the 
presence of bright diffuse Galactic dust emission was found to lead 
to the PwS SZ catalogue (in common with catalogues produced 
by other methods, such as MMF) containing bright spurious detec- 
tions. Hence one did not obtain a regular cumulative purity curve 
that slowly approaches unity as the ln(evidence), or the SNR, in- 
creases (Melin et al. 2011), in contradiction to what would be ex- 
pected from theory if our model explained the data properly. 

Indeed, the detection of SZ galaxy clusters highlights further 
problems. Again in the analysis of Planck simulations using pre- 
vious versions of PwS, one finds that bright spurious SZ signals 
are not only concentrated in complex background regions, with 
a fraction of the bright spurious detections spread all across the 
sky. By cross-correlating the resulting SZ catalogues with ancil- 
lary point source data sets, one finds that bright spurious cluster 
detections matched bright point source locations. In our prelimi- 
nary attempts to address this problem, we therefore first performed 
a point source extraction step and subsequently subtracted/masked 
the best-fit point source profiles in the maps. This pre-processing 
step greatly helped in reducing the number of spurious detections, 
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especially those with very high evidence values. Another approach 
has been suggested by the Planck WG5 team, namely the '\ 2 
test' (Planck Collaboration 2011, c). This performed very well, 
although, once more, there is no easy way to choose a robust ac- 
ceptance/rejection threshold for the test. Another difficulty occurs 
when extracting the SZ effect at each individual channel. The SNR 
was usually so low that the measurements ended up being quite 
noisy. 

Can we do any better using Bayesian logic? The apparent fail- 
ure of the 'best' test can be immediately explained using the main 
Bayesian decision equation, equation (6). Our decision criterion is 
based on the ln(odds), namely 



In 



Pr(tfi|d) 



(51) 



_Pr(H \d)_ 

The problem comes from the denominator Pr(Ho\d). When we 
find a point source, its probability of being a cluster, Pr(_ffi \d), is 
very low, but the probability of those pixels being part of the back- 
ground, Pr(Ho\d), is also very low, because point sources do not 
fit our model of the background either. We have already mentioned 
that the binary model is too simple to handle realistic astronomical 
situations. To secure the optimality of our methodology we must 
ensure that the data is well described by our model, and employ a 
multi-model approach, as described in Section 4.5 below. 

4.5 The solution: multi-model, multi-frequency detection 

For the reasons outlined above, we believe that a deeper and purer 
catalogue can only be obtained through multi-frequency analysis, 
which cannot be pursued without assuming some spectral signa- 
ture. An excellent example of the power of such an approach is pro- 
vided by the detection of SZ clusters. Despite the SZ signal being 
sub-dominant on all Planck channels (the signal level is below that 
of the background), an optimal combination of the different fre- 
quencies can boost these extremely faint signals to the point where 
one can now build reliable catalogues of many hundreds of such 
objects 

We have also demonstrated above that our simple binary de- 
cision making approach is too naive to handle 'real-life' situations. 
The introduction of a multi-model (more than two models) decision 
rule cannot, however, be achieved simply by extending the binary 
case (Jaynes, Ch. 3), although it is always possible to reduce the 
general multi-model decision rule to a succession of binary ones. 
We start by choosing one of the hypothesis, say Ho = 'this maxi- 
mum is a background fluctuation , and making it the 'null' or 'refer- 
ence hypothesis' . Then we iterate through all the hypotheses associ- 
ated with different source families and we compute the Qi = odds 4 : 

_ Pr{Hi\d) 



i ^ 



(52) 



Pv(H \dy 

The optimal way of deciding between M + 1 different hypothe- 
sis (M source types plus the null hypothesis) is by evaluating the 
odds for each type of source against the null hypothesis, pick up 
the largest Qi, which we denote by Qi*, and then check for the fol- 
lowing inequality 



1 + E Mi , Qi h 



(53) 



where £ — £spurious/-£''miss is the ratio of the losses when accept- 
ing a false positive (spurious) and when missing a source. We 
are implicitly assuming that the penalty for choosing wrongly in 
favour or against the most probable hypothesis, Hi*, is always 



-^spurious Or -/vmiss regardless of the true/alternative hypothesis and 
there is no loss when choosing wrongly between any of the alter- 
native hypotheses. 

A pivotal quantity in catalogue making, as we shall shortly 
see, is the probability that axertain entry in the putative catalogue is 
a spurious detection: Pr(Hi*\D). Providing this value is a unique 
capability of the Bayesian approach. It is very simple to show that, 
when extending the binary test to multiple hypotheses, the proba- 
bility of a spurious detection now reads (Jaynes, Ch 3): 



Pr{Hi*\d) 



with ip 



(54) 



4.6 Catalogue making 

The last step of PwS is to assemble the final catalogue from a list of 
candidates. During this stage, PwS performs the following steps: 

(i) maps flat sky patches back onto the sphere at the the positions 
of the putative detections; 

(ii) applies a detection mask, if any;. 

(iii) merges multiple detections of the same source obtained in 
different patches into a single candidate detection; and 

(iv) makes the final catalogue by rejecting those lines that do not 
meet the pre-established criterion of purity or loss. 

The last step is critical to the success of our methodology. We al- 
ready gave some indication in Section 2.2.3 about how to address 
the difficult task of selecting a sub-set of detections from our ini- 
tial list of candidates. If the selection criterion is based on losses, 
then we just need to trim the 'proto-catalogue' further by apply- 
ing the decision rule (53). But, as we mentioned previously, it is 
much more common in astronomy to require a catalogue to have 
an expected contamination ratio or that the contamination does not 
exceed a prescribed value. We are now finally in position to show 
how the Bayesian logic framework can give us exactly that. 

The number of false positives in a catalogue may be repre- 
sented as a sum of Bernoulli variables. Assuming all catalogue en- 
tries are statistically independent, then the sum of N of those vari- 
ables is distributed as a Poisson-binomial distribution: 



E 



Pi. 



^2piO--Pi), 



(55) 



where pi = Pri(Hi* \d), is the probability of source i being a false 
positive. 

Therefore, one way to proceed is as follows: 

(i) sort the list of candidate detections in ln(odds) descending 
order (pi ascending order); 

(ii) for each candidate, accumulate pi until fi (see for- 
mula 55) exceeds the prescribed contamination a = 
(spurious detections)/(total lines in catalogue) times the total 
number of lines already included; and 

(iii) discard the last line. 

As /i is a sum of independent variables and N is usually a large 
number (hundreds), it is perfectly reasonable to assume the distri- 
bution converges to a Gaussian as result of the CLT 10 . So, a good 



10 Note this time we are working around the distribution mode. 
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estimate of the number of spurious detections in the catalogue is 

(56) 



j \ Z- — j 

1 \ i=l 



and an estimate of the fraction of spurious detections in the cata- 
logue, a, reads: 



e^pA ± yte^ 1 -^ (57) 



N 



N 



A problem still remains, however, since our calculation of 
Pri(Hi* \D) is only an approximation, although we do have an es- 
timate of the ln(odds) evaluation uncertainty (for a rigorous treat- 
ment see Keeton (2011)). We therefore need to introduce correc- 
tions into the above formulas to account for the uncertainty on pi. 
It is easy to verify that, to a first approximation, the error on pi, 
reads: 



|Api| ~ 7Pi(l -pi) 



(58) 



where the value of 7 is the average evidence evaluation fractional 
error. The corrected value of the catalogue's variance on the number 
of spurious, a' 2 , is always less than: 



^ ,2 <(i+7)E^( 1 -^)' 

i=l 

and the variance on /j, reads: 

n n 

|A/j| 2 ~ 7 2 ^p 2 (l -pi) 2 < -y^Piil -pi 



(59) 



(60) 



Thus, we get the final expression of predicted contamination of the 
catalogue by adding both contributions in quadrature: 



EN 
i=lPi 

N 



± • (61) 



N 



The uncertainty on the contamination of the catalogue for com- 
monly accepted levels (~ 10%), catalogue size (> 1000) and 7 as 
large as 0.32 (value taken from the extraction exercises with Planck 
data), is always < 1.2%. 

Finally, we are now in position to answer the key, question all 
the frequentist methods must at some point face, "What threshold 
should one use for accepting the candidates for inclusion in the fi- 
nal catalogue?", although the question is no longer relevant in our 
Bayesian approach, since it is an output of our catalogue-making 
method, rather than an input. The answer is just "the In(odds) es- 
timate of the last line of the final catalogue", since the initial list 
of putative detections was sorted in descending order of ln(odds) 
and all those with a higher or equal ln(odds), and only those, were 
selected for inclusion. 



5 IMPLEMENTATION HISTORY 

The data analysis philosophy and set of algorithms described in 
this paper have not so far been fully implemented in a coded ver- 
sion of PwS. We are working towards this aim, and the release cor- 
responding to the full set of features described here will be PwS 
v4.0. The versions that have been used in published data analy- 
ses so far are vl.5 and v3.1 for the SZ Challenge (Melin et al. 
2011), v2.01 for the lower frequency point sources in the Planck 



ERCSC (Planck Collaboration 201 1, b) and for all frequency chan- 
nels in the Compact Source Investigation workshop (CSI) (Rocha 
et al 201 1), v3.6 in application to SZ cluster detection in the Planck 
ESZ sample (Planck Collaboration 2011, c) and to characterise a 
single cluster parameters in a non-blind exercise (Planck Collabo- 
ration et al. 201 1, g). It is worth noting that these versions include 
a pre-processing tool specifically designed to convert data sets dis- 
tributed within the Planck collaboration into the format required by 
PwS. The main tasks performed by this tool are: 

• taking account of the masking and/or flagging of ill-observed 
pixels and contaminated regions; 

• projecting the spherical maps into flat patches 11 ; 

• mapping of coordinates from the sphere into the patches and 
back; 

• removal of multiple detections of the same source in different 
patches; 

• assembly of the output catalogues into the required format. 

The existing released versions of PwS differ from what will be 
available in v4.0 mainly in the limitation to a binary model selec- 
tion step in determining when to accept a putative source detection 
and to a non-parametrised frequency spectrum in multi-frequency 
detection. The latter restriction meant that, while SZ cluster detec- 
tion could be carried out using all Planck frequencies simultane- 
ously, point source detections, in common with the other methods 
available, were carried out for each frequency channel separately. 
PwS v4.0 will aim at genuine multi-frequency and indeed multi- 
model detection, using all the available data simultaneously. 



6 CONCLUSIONS 

The Planck satellite, and many other modern cosmological, data 
sets present completely new challenges for the detection and de- 
scription of compact objects. Two important traits of such obser- 
vations are (i) low or very low SNRs; and (ii) strongly correlated 
backgrounds with typical scales similar to those of the objects 
being sought. These attributes render traditional object detection 
methods sub-optimal, since: (i) it is difficult to separate the sources 
from the background fluctuations; and (ii) the uncertainties on de- 
rived source parameters are important and traditional methods do 
not provide them. 

A better strategy is to develop an object detection methodol- 
ogy from a strong statistical foundation first. The linear filtering 
family of tools is the attempt by the orthodox frequentist school of 
probability to overcome these limitations. The matched filter and 
all its derivatives are based on the Neymann-Pearson likelihood ra- 
tio, although their optimal performance is extremely dependent on 
the choice of the acceptance/rejection threshold and on implemen- 
tation details. Despite of their widespread use, the actual practical 
designs of these tools do not yet implement a sound framework to 
handle the uncertainties on the parameter estimates. 

Bayesian methods have the great advantage of providing a 
coherent probability methodology with the option to include, in 
a completely consistent way, all ancillary information. But prob- 
ability theory by itself only gives us a degree of belief. In order 
to produce a catalogue, decisions must be made as well. Decision 
Theory is unambiguous: In [pTpj^pjyJ is the optimal decision tool 



11 The patch set usually contains about 12, 000 7.33° X 7.33° flat patches 
or 3, 000 14.66° x 14.66° instead 
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(in the binary case), although the binary model is manifestly not 
powerful enough to handle a real data set. The necessary extension 
to a multi-model foundation is mandatory for an operational and 
viable solution. 

In PwS we have attempted to implement a fast, multiple model 
decision rule based on the Bayesian ln(odds) device. To achieve 
our goal we focused on taking advantage of the symmetries of the 
multi-channel likelihood manifold to design an efficient, though 
rigorous, exploration tool. Owing to its full, consistent probabil- 
ity foundation, PwS can provide a sound and complete statistical 
characterization of its results. Simultaneously, we can offer effec- 
tive solutions for the difficulties accompanying real data, without 
compromising any of our goals. 
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